Add some additional information to customize the knitted document:
date: "September 24, 2020"
output:
html_document:
number_sections: yes
theme: cerulean
toc: yes
toc_depth: 5
toc_float: yes
pdf_document:
toc: yes
toc_depth: '5'
This will add a table of contents (toc) and will change the colors (theme: cerulean)
To find your favorite Rmarkdown theme: https://www.datadreaming.org/post/r-markdown-theme-gallery/
knitr::opts_chunk$set(cache=TRUE, fig.path='figures/', fig.width=8, fig.height=5 )
This saves all figures in the directory figures and sets the default figure size
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Rmarkdown Cheatsheet: https://rmarkdown.rstudio.com/lesson-15.html
“#” hash signs indicate headers.
The number of hashes equals the header level.
placing a single asterisk on either side of a phrase makes it italic.
double asterisks make a word or phrase bold.
triple asterisks make a word or phrase bold and italic.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
Execute this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.
You can also embed plots, for example:
(Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.)
echo =FALSE will only display the output, not the code.
Some more chunk options: * Use echo=FALSE to avoid having the code itself shown. * Use results="hide" to avoid having any results printed. * Use eval=FALSE to have the code shown but not evaluated. * Use warning=FALSE and message=FALSE to hide any warnings or messages produced. * Use fig.height and fig.width to control the size of the figures produced (in inches).
naming chunks = good practice (the above chunk was named pressure) * helps navigate around the document & this is what the figures will be named
(check the Rproject directory after knitting)
You can also include images from your local computer or from the web:
Can type out tables:
| col name | ||
|---|---|---|
| 1 | 1 | 1 |
| 2 | 2 | 2 |
Alternatively, you can use the knitr package to make mardown tables from data frames:
| speed | dist |
|---|---|
| 4 | 2 |
| 4 | 10 |
| 7 | 4 |
| 7 | 22 |
| 8 | 16 |
| 9 | 10 |
left, right, center adjust
When you knit the file, an HTML file containing the code and output will be saved alongside it (click the Knit button or press Cmd+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor (Viewer tab).
Rproject Benefits:
No need to set the working directory. All paths are relative to the directory containing the Rproject.
Whenever you open your project, the working directory is automatically set to where your project is. This means your code will not break when you work on a different computer.
RStudio projects allow you to open multiple projects at the same time with each open to its own project directory. This allows you to keep multiple projects open without them interfering with each other.
Good organization / project lay out will:
Project Management tips:
resultssrc directoryfig1_pca_communitycomposition.jpg not Rplot1.jpg)ln -s)data for this workshop
following good project management practices, make a new directory called data and download the data we will be playing with in this workshop into that directory:
In terminal tab:
mkdir data
cd data
wget https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder_data.csv
curl
We will use the data later, but we can get a general sense of the data by looking at it in the terminal, which will help us decide how to load it into R later:
wc -l gapminder_data.csv
head gapminder_data.csv
cd -
go to your GitHub account and make a new repository DO NOT initialize with a README
follow the instructions on the next page
(in terminal tab)
echo "# SkillPill_ReproducibleR" >> README.md
git init
git add README.md
git commit -m "first commit"
git remote add origin https://github.com/maggimars/SkillPill_ReproducibleR.git
git push -u origin master
README.md is a markdown file, just like this Rmarkdown file in many ways- uses similar syntax.
try also adding your data directory to your Github repository!
Alternatively - you can use the Rstudio interface to version control with Git https://swcarpentry.github.io/git-novice/14-supplemental-rstudio/
(I prefer command line)
?function_name
If you can’t really remember a function name ??function_name
pro-tip From within the function help page, you can highlight code in the Examples and hit Ctrl+Return to run it in RStudio console. This is gives you a quick way to get a feel for how a function works.
?kable
for special operators use quotes, e.g. `?“<-”
Without any arguments, vignette() will list all vignettes for all installed packages; vignette(package="package-name") will list all available vignettes for package-name, and vignette("vignette-name") will open the specified vignette.
And then there is always google.
We already looked at the sample data in Terminal and saw that it was a .csv file with 1705 lines and that it does have a header.
gapminder<- read.csv("data/gapminder_data.csv", header = TRUE)
View data in another tab with View()
when your data is in a github repo - you can also use it directly from the repo:
library(data.table) # you might need to install this package
gapminder<- fread("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder_data.csv", header = TRUE)
To get more information about the data:
dim()
str()
summary()
length()
nrow()
ncol()
names()
head()/tail()
other types we might see ?
str(factor(gapminder$continent))
## Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
ordering_example <- factor(gapminder$continent, levels= c("Oceania", "Asia", "Europe", "Africa", "Americas"))
str(ordering_example)
## Factor w/ 5 levels "Oceania","Asia",..: 2 2 2 2 2 2 2 2 2 2 ...
as.character() - change factors back into characters
as.numeric() - but need to use as.character() first
can also set stringsAsFactors = FALSE when reading in data
(gapminder$continent == "Asia")[c(1:100)]
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [97] TRUE TRUE TRUE TRUE
removing rows and columns (subsetting)
by number
gapminder[-1,]
## country year pop continent lifeExp gdpPercap
## 1: Afghanistan 1957 9240934 Asia 30.332 820.8530
## 2: Afghanistan 1962 10267083 Asia 31.997 853.1007
## 3: Afghanistan 1967 11537966 Asia 34.020 836.1971
## 4: Afghanistan 1972 13079460 Asia 36.088 739.9811
## 5: Afghanistan 1977 14880372 Asia 38.438 786.1134
## ---
## 1699: Zimbabwe 1987 9216418 Africa 62.351 706.1573
## 1700: Zimbabwe 1992 10704340 Africa 60.377 693.4208
## 1701: Zimbabwe 1997 11404948 Africa 46.809 792.4500
## 1702: Zimbabwe 2002 11926563 Africa 39.989 672.0386
## 1703: Zimbabwe 2007 12311143 Africa 43.487 469.7093
gapminder[,-1]
## year pop continent lifeExp gdpPercap
## 1: 1952 8425333 Asia 28.801 779.4453
## 2: 1957 9240934 Asia 30.332 820.8530
## 3: 1962 10267083 Asia 31.997 853.1007
## 4: 1967 11537966 Asia 34.020 836.1971
## 5: 1972 13079460 Asia 36.088 739.9811
## ---
## 1700: 1987 9216418 Africa 62.351 706.1573
## 1701: 1992 10704340 Africa 60.377 693.4208
## 1702: 1997 11404948 Africa 46.809 792.4500
## 1703: 2002 11926563 Africa 39.989 672.0386
## 1704: 2007 12311143 Africa 43.487 469.7093
drop multiple rows/columns …
drop or select columns by name
gapminder[, c("year", "pop", "continent")]
## year pop continent
## 1: 1952 8425333 Asia
## 2: 1957 9240934 Asia
## 3: 1962 10267083 Asia
## 4: 1967 11537966 Asia
## 5: 1972 13079460 Asia
## ---
## 1700: 1987 9216418 Africa
## 1701: 1992 10704340 Africa
## 1702: 1997 11404948 Africa
## 1703: 2002 11926563 Africa
## 1704: 2007 12311143 Africa
gapminder[ , -c("year", "pop", "continent")]
## country lifeExp gdpPercap
## 1: Afghanistan 28.801 779.4453
## 2: Afghanistan 30.332 820.8530
## 3: Afghanistan 31.997 853.1007
## 4: Afghanistan 34.020 836.1971
## 5: Afghanistan 36.088 739.9811
## ---
## 1700: Zimbabwe 62.351 706.1573
## 1701: Zimbabwe 60.377 693.4208
## 1702: Zimbabwe 46.809 792.4500
## 1703: Zimbabwe 39.989 672.0386
## 1704: Zimbabwe 43.487 469.7093
select rows conditionally
gapminder[country == "Zimbabwe",]
## country year pop continent lifeExp gdpPercap
## 1: Zimbabwe 1952 3080907 Africa 48.451 406.8841
## 2: Zimbabwe 1957 3646340 Africa 50.469 518.7643
## 3: Zimbabwe 1962 4277736 Africa 52.358 527.2722
## 4: Zimbabwe 1967 4995432 Africa 53.995 569.7951
## 5: Zimbabwe 1972 5861135 Africa 55.635 799.3622
## 6: Zimbabwe 1977 6642107 Africa 57.674 685.5877
## 7: Zimbabwe 1982 7636524 Africa 60.363 788.8550
## 8: Zimbabwe 1987 9216418 Africa 62.351 706.1573
## 9: Zimbabwe 1992 10704340 Africa 60.377 693.4208
## 10: Zimbabwe 1997 11404948 Africa 46.809 792.4500
## 11: Zimbabwe 2002 11926563 Africa 39.989 672.0386
## 12: Zimbabwe 2007 12311143 Africa 43.487 469.7093
gapminder[country!= "Afghanistan",]
## country year pop continent lifeExp gdpPercap
## 1: Albania 1952 1282697 Europe 55.230 1601.0561
## 2: Albania 1957 1476505 Europe 59.280 1942.2842
## 3: Albania 1962 1728137 Europe 64.820 2312.8890
## 4: Albania 1967 1984060 Europe 66.220 2760.1969
## 5: Albania 1972 2263554 Europe 67.690 3313.4222
## ---
## 1688: Zimbabwe 1987 9216418 Africa 62.351 706.1573
## 1689: Zimbabwe 1992 10704340 Africa 60.377 693.4208
## 1690: Zimbabwe 1997 11404948 Africa 46.809 792.4500
## 1691: Zimbabwe 2002 11926563 Africa 39.989 672.0386
## 1692: Zimbabwe 2007 12311143 Africa 43.487 469.7093
gapminder[lifeExp >= 80,]
## country year pop continent lifeExp gdpPercap
## 1: Australia 2002 19546792 Oceania 80.370 30687.75
## 2: Australia 2007 20434176 Oceania 81.235 34435.37
## 3: Canada 2007 33390141 Americas 80.653 36319.24
## 4: France 2007 61083916 Europe 80.657 30470.02
## 5: Hong Kong China 1997 6495918 Asia 80.000 28377.63
## 6: Hong Kong China 2002 6762476 Asia 81.495 30209.02
## 7: Hong Kong China 2007 6980412 Asia 82.208 39724.98
## 8: Iceland 2002 288030 Europe 80.500 31163.20
## 9: Iceland 2007 301931 Europe 81.757 36180.79
## 10: Israel 2007 6426679 Asia 80.745 25523.28
## 11: Italy 2002 57926999 Europe 80.240 27968.10
## 12: Italy 2007 58147733 Europe 80.546 28569.72
## 13: Japan 1997 125956499 Asia 80.690 28816.58
## 14: Japan 2002 127065841 Asia 82.000 28604.59
## 15: Japan 2007 127467972 Asia 82.603 31656.07
## 16: New Zealand 2007 4115771 Oceania 80.204 25185.01
## 17: Norway 2007 4627926 Europe 80.196 49357.19
## 18: Spain 2007 40448191 Europe 80.941 28821.06
## 19: Sweden 2002 8954175 Europe 80.040 29341.63
## 20: Sweden 2007 9031088 Europe 80.884 33859.75
## 21: Switzerland 2002 7361757 Europe 80.620 34480.96
## 22: Switzerland 2007 7554661 Europe 81.701 37506.42
## country year pop continent lifeExp gdpPercap
using & and |
if, if else, and for
allows us to control when an action is taken
# if
if (condition is true) {
perform action
}
# if ... else
if (condition is true) {
perform action
} else { # that is, if the condition is false,
perform alternative action
}
examples:
x <- 8
if (x >= 10) {
print("x is greater than or equal to 10")
}
x
## [1] 8
x <- 8
if (x >= 10) {
print("x is greater than or equal to 10")
} else {
print("x is less than 10")
}
## [1] "x is less than 10"
x <- 8
if (x >= 10) {
print("x is greater than or equal to 10")
} else if (x > 5) {
print("x is greater than 5, but less than 10")
} else {
print("x is less than 5")
}
## [1] "x is greater than 5, but less than 10"
Challenge:
Use an if() statement to print a suitable message reporting whether there are any records from 2002 in the gapminder dataset:
Looping
If you want to iterate over a set of values, when the order of iteration is important, and perform the same operation on each, a for() loop will do the job.
Basic Structure:
for (iterator in set of values) {
do a thing
}
Example
for (i in 1:10) {
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
Nested for loop:
for (i in 1:5) {
for (j in c('a', 'b', 'c', 'd', 'e')) {
print(paste(i,j))
}
}
## [1] "1 a"
## [1] "1 b"
## [1] "1 c"
## [1] "1 d"
## [1] "1 e"
## [1] "2 a"
## [1] "2 b"
## [1] "2 c"
## [1] "2 d"
## [1] "2 e"
## [1] "3 a"
## [1] "3 b"
## [1] "3 c"
## [1] "3 d"
## [1] "3 e"
## [1] "4 a"
## [1] "4 b"
## [1] "4 c"
## [1] "4 d"
## [1] "4 e"
## [1] "5 a"
## [1] "5 b"
## [1] "5 c"
## [1] "5 d"
## [1] "5 e"
storing results
output_vector <- c()
for (i in 1:5) {
for (j in c('a', 'b', 'c', 'd', 'e')) {
temp_output <- paste(i, j)
output_vector <- c(output_vector, temp_output)
}
}
Challenge:
Write a script that loops through the gapminder data by continent and prints out whether the mean life expectancy is smaller or larger than 50 years.
reusable! (and therefore reproducible!)
Often start by writing a function within an interactive session.
Lets write a function that converts Fahrenheit to Celcius (bc I am moving to back to America and I’m going to need this)
fahr_to_celc <- function(temp) {
celc <- ((temp - 32) * (5 / 9))
return(celc)
}
get body temp in celcius: (seems to be important these days)
fahr_to_celc(98.6)
## [1] 37
Stopifnot
fahr_to_celc <- function(temp) {
stopifnot(is.numeric(temp))
celc <- ((temp - 32) * (5 / 9))
return(celc)
}
What happens if you call with a number?
What if you call with a string?
Combining Functions:
Define two functions
Define a new function that calls both these functions to convert fahrenheit to kelvin
A more useful example:
Calculate gross domestic product in our data set
# Takes a dataset and multiplies the population column
# with the GDP per capita column.
calcGDP <- function(dat) {
gdp <- dat$pop * dat$gdpPercap
return(gdp)
}
calcGDP(head(gapminder))
## [1] 6567086330 7585448670 8758855797 9648014150 9678553274 11697659231
But that is not super useful - lets add more arguments so we can extract per country per year :
# Takes a dataset and multiplies the population column
# with the GDP per capita column.
calcGDP <- function(dat, year=NULL, country=NULL) {
if(!is.null(year)) {
dat <- dat[dat$year %in% year, ]
}
if (!is.null(country)) {
dat <- dat[dat$country %in% country,]
}
gdp <- dat$pop * dat$gdpPercap
new <- cbind(dat, gdp=gdp)
return(new)
}
default arguments are NULL
head(calcGDP(gapminder, year=2007))
## country year pop continent lifeExp gdpPercap gdp
## 1: Afghanistan 1952 8425333 Asia 28.801 779.4453 6567086330
## 2: Afghanistan 1957 9240934 Asia 30.332 820.8530 7585448670
## 3: Afghanistan 1962 10267083 Asia 31.997 853.1007 8758855797
## 4: Afghanistan 1967 11537966 Asia 34.020 836.1971 9648014150
## 5: Afghanistan 1972 13079460 Asia 36.088 739.9811 9678553274
## 6: Afghanistan 1977 14880372 Asia 38.438 786.1134 11697659231
calcGDP(gapminder, country="Australia")
## country year pop continent lifeExp gdpPercap gdp
## 1: Afghanistan 1952 8425333 Asia 28.801 779.4453 6567086330
## 2: Afghanistan 1957 9240934 Asia 30.332 820.8530 7585448670
## 3: Afghanistan 1962 10267083 Asia 31.997 853.1007 8758855797
## 4: Afghanistan 1967 11537966 Asia 34.020 836.1971 9648014150
## 5: Afghanistan 1972 13079460 Asia 36.088 739.9811 9678553274
## ---
## 1700: Zimbabwe 1987 9216418 Africa 62.351 706.1573 6508240905
## 1701: Zimbabwe 1992 10704340 Africa 60.377 693.4208 7422611852
## 1702: Zimbabwe 1997 11404948 Africa 46.809 792.4500 9037850590
## 1703: Zimbabwe 2002 11926563 Africa 39.989 672.0386 8015110972
## 1704: Zimbabwe 2007 12311143 Africa 43.487 469.7093 5782658337
Challenge: Test out your GDP function by calculating the GDP for New Zealand in 1987. How does this differ from New Zealand’s GDP in 1952?
moving functions to rscripts and sourcing scripts (best practices for data management!)